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ABSTRACT 



A method for tracking a predetermined, two-dimensional 
portion of an image throughout a sequence of images, the 
method comprises the steps of selecting a reference frame; 
selecting the predetermined, two-dimensional portion within 
the reference frame by choosing a polygon that defines the 
boundary of the predetermined portion; fitting a reference 
mesh having at least three comer nodes and an inside node 
to the reference polygon; tracking the reference polygon in 
subsequent or previous image frames by tracking the comer 
nodes; mapping the reference mesh into the tracked polygon 
in the subsequent or previous image frames; and refining 
locations of the inside and comer nodes in the subsequent or 
previous image frames for tracking local and global motion 
of the predetermined portion. 

13 Claims, 12 Drawing Sheets 
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METHOD FOR REGION TRACKING IN AN the template into sub-templates, and estimates the individual 

IMAGE SEQUENCE USING A TWO- displacement of each sub-template. The parameters of the 

DIMENSIONAL MESH aflSne transformation are found from the displacement infor- 
mation of the sub -templates. Although this method employs 

FIELD OF INVENTION 5 local displacement information, it does so only to find a 

Hie present invention is related to the field of digital global afhne transformation for representing the motion of 

image processing and analysis and, more specificaUy, to a ^^ject Tlierefore, while it tracks the global motion 

technique for tracking a two-dimensional portion of an ^° ^"^f.^^ object, it camiot track any deformations that 

image, feature of an image, or a particular object for ^^^^^^ ^^e object (i.e., local deformations), 

two-dimensional images that arc sequentially placed in Although the presently known and utilized methods are 

chronological order for display. satisfactory, they are not without drawbacks. In addition to 

the above-described drawbacks, they also do not take into 

BACKGROUND OF THE INVENTION account the effects of frame-to-frame illumination changes. 

In a wide variety of image sequence processing and Consequently, a need exists for an improved tracking 

analysis tasks, there is a great need for an accurate method technique that can track objects withm a scene which are 

for tracking the intensity and motion of a porUon of an image undergoing local deformations and illummation changes, 

throughout an image sequence. This portion, called the SUMMARY OF INVENTION 
reference region hereinafter, may conrespond to a particular 

object or a portion of an object in the scene. ^ TTie present invention provides an improvement designed 
Tracking the boundary of an object has been discussed in to satisfy the aforementioned needs. Particularly, the present 
M. Kass, A. Witkin, and D. Terzopoulos, "Snakes: Active invention is directed to a method for tracking a 
Contour Models"', International Journal of Computer predetermined, two-dimensional portion of an image 
Vision, volume 1, no. 4, pp. 321-331, 1988; R Leymarie and throughout a sequence of images, the method comprising the 
M. Levine, "Tracking Deformable Objects in The Plane ^5 s^^P^ of (a) selecting a reference frame; (b) selecting the 
Using An Active Contour Model", IEEE Transactions Pat- predetermined, two-dimensional portion within the refer- 
red Analysis and Machine Intelligence, volume 15, pp. ence frame by choosmg a polygon that defines the boundary 
617-634, June 1993; K. Fujimura, N. Yokoya, and K. of the predetermined portion; (c) fitting a reference mesh 
Yamamoto, "Motion Tracking of Deformable Objects By having at least three comer nodes and an inside node to the 
Active Contour Models Using Multiscale Dynamic 30 reference polygon; (d) tracking the reference polygon in 
Programming", Journal of Visual Communication and subsequent or previous image frames by tracking the comer 
/wa^e/ig?re5ert/aric?n, vol. 4, pp. 382-391, December 1993; nodes; (e) mapping the reference mesh into the tracked 
B. Bascle, et al., "Tracking Complex Primitives in An Image polygon in the subsequent or previous image frames; and (f) 
Sequence", in IEEE International Conference Pattern refining locations of the inside and comer nodes in the 
Recognition, pp. 426-431, October 1994, Israel; F G. Meyer 35 subsequent or previous image frames for tracking local and 
and P. Bouthemy, "Region-Based Tracking Using Affine global motion of the predetermined portion. 

Motion Models in U)ng Image ^^q^^^^^^ BRIEF DESCRIPTION OF DRAWINGS 
Understanding, volume 60, pp. 119-140, September 1994, 

all of which are herein incorporated by reference. The in the course of the following detailed description, refer- 

methods disclosed therein, however, do not address the 40 ence will be made to the attached drawings in which: 

tracking of the local deformations within the boundary of the ^ ^ ^ perspective view of a computer system for 

object. implementing the present invention; 

Methods for tracking local deformations of an entire FIGS. 2A and 2B are flowcharts for the method of the 

frame using a 2-D mesh structure are disclosed in J. present invention- 

Niewglowski,T Campbell andR Haavisto ;'ANovel\^deo 45 3 33 ^- iUustrating the method of 

Codmg Scheme Based on Temporal Prediction Using Digi- pi^c 2 A H 2B 

tal Image Warping", IEEE Transactions Consumer ana zn, .^^^ ^ 

Electronics, volume 39, pp. 141-150, August 1993; Y FIG. 4 is an exploded view of a portion of FIG. 3; 

Nakaya and H. Harashima, "Motion Compensation Based FIG. 5 is a diagram illustrating the comer tracking method 

on Spatial Transformations", Transaction Circuits and 50 of FIGS. 2A and 2B; 

System Video Technology, volume 4, pp. 339-357, June FIGS. 6A, 6B, 7 and 8 are diagrams further illustrating the 

1994; M. Dudon, 0. Avaro, and G. Eud; "Object-Oriented corner tracking method of FIGS. 2 A and 2B; 

Motion Estimation", in Picture Coding Symposium, pp. piG. 9 is a diagram iUustrating the method for mapping a 

284-287, September 1994, CA; C.-L. Huang and C.-Y Hsu, reference mesh of HGS. 2A and 2B; 

"A New Motion Compensation Method for Image Sequence 55 piGS. lOA, lOB, 11, 12Aand 12B arc diagrams depicting 

Coding Using Hierarchical Grid Interpolation", /E££rrflAts- ^ method for refining the location of inside nodes of the 

actions Circuits and System Video Technology, volume 4, reference mesh- 

pp. 42-52, February 1994, all of which are herein incorpo- ^3 ^ ; ^^^^^^^ illustrating a logarithmic search 

rated by reference. However, these methods always include ^^^^^^ ^^^^^^^ .^.^^ 

the whole frame as the object of interest. They do not so • j- a -h 

address the problem of trackiDg an individual object bound- 1/A-14C ,s a diagram further illustratmg the 

ary within the frame. method of FIG. 13; 

U.S. Pat. No. 5,280,530, which is herein incorporated by FJ-G^ » .^^^S^^-^ illustrating a loganthmic search 

reference, discusses a method for tracking an object within n^^thod for refining the location of a boundary node 

a frame. This method employs a single spatial transforma- 65 FIG- 16 is a flowchart illustrating the method of FIG. 15; 

tion (in this case a£Ene transformation) to represent the FIG. 17 is a diagram illustrating a logarithmic search 

motion of an object. It forms a template of the object, divides method for refining the location of a comer node; 
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FIG. 18 is a diagram further illustrating the method of It is instructive at this point to clarify some of the notation 

_ . used herein, which is as follows. 1 denotes the total 

HG. 19 is a diagram lUustratmg a method of incorporat- ^^^^^^ ^^^^^ ^^^^ sequence in which the 

ing illumination changes during motion tracking; reference object 11 is to be tracked. For convenience, the 

FIG. 20A-20E is a diagram illustrating hierarchical hex- ^ user renumbers these frames, typically starting with 1, in 

agonal search method. which the reference object 11 is to be tracked. In this regard, 

f„ denotes the renumbered frames, P„ denotes the polygon in 

DETAILED DESCRIPTION OF INVENTION fn, and Mn denotes the mesh in fn, where l^n ^T. 

, . . . J . * Furthermore, r denotes the sequence number of the reference 

Referring to HG. 1, there is iDuslrated a computer system lo ^ p ^ respectively denote the refer- 

1 for implementing the pre^nt mvention. /^though the ^^^^ ^^^^^ ^^^^^^^^ p^j^^^^ ^2, and the reference 

computer system 1 is shown for the purpose of lUustratmg ^^^^ ^1. Finally, ^ denotes the number of corners of P,. 

a preferred embodiment, the present invention is not Imiited ^^^^^ processing is arbitrary and does not 

to the computer system 1 shown, but may be used on any ^^^^^ ^j^^ performance of the method. Preferably, the for- 

electronic processing system (for example a SPARC-20 15 ^^^^ ^^^^ direction is first chosen so that the reference 

workstation). The computer system 1 includes a micropro- ^^^^^^ ^ ^^^^^ ^-^j^ sequence numbers 
cessor based unit 2 for receiving and processing software 

programs and for performing other well known processing (r-»-l), (r+2) . . . , ^ and then in frames with sequence 

functions. The software programs are contained on a com- numbers (r-l),(r-2) . . , , 1. 

puter usable medium 3, typically a disk typically, and are 20 B. Fitting a 2-D Mesh Into The Reference Polygon (Step 20) 

inputted into the microprocessor based unit 2 via the disk 3. Refening to FIGS. 2, 3 and 4, the next step 20 involves 

A display 4 is electronically connected to the microprocessor fitting a mesh 21 to the reference polygon 12, called the 

based unit 2 for displaying user related information associ- reference mesh 21 that is subsequently tracked. It is the 

ated with the software. A keyboard 5 is also connected to the subdivisions of the mesh 21 that allows for tracking regions 

microprocessor based imit 2 for allowing a user to input 25 that exhibit locally varying motion, such as those corre- 

information lo the software. As an alternative to using the sponding to objects within a particular scene having either 

keyboard 5, a mouse 6 may be used for moving an icon 7 on curved or deforming surfaces or both in combination. The 

the display 4 and for selecting an item on which the icon 7 subdivisions, or patches 22, of the reference mesh 21 arc 

overlays, as is well known in the art. A compact disk — ^read defined by the locations of the nodes 23 of the reference 

only memory (CD-ROM) 8 is connected to the micropro- 30 mesh 21. For example, FIG. 4 shows a depiction of a 

cessor based unit 1 for receiving software programs and for triangular mesh 21 fit into a quadrilateral reference polygon 

providing a means of inputting the software programs and 12. 

other information to the microprocessor based unit 1. A To create the reference mesh 21, the reference polygon 12 

compact disk (not shown) typically contains the software is first placed on a regular rectangular grid. The dimensions 

program for inputting into the CD-ROM 9. A printer 9 is 35 of each rectangle in the regular rectangular grid are specified 

connected to the microprocessor based unit 2 for printing a by the user. The non-rectangular cells (e.g., trapezoids, 

hardcopy of the output of the computer system 1. pentagons, etc.) that are formed along the boundary of the 

The below-described steps of the present invention are reference polygon 12 are divided into appropriate number of 

implemented on the computer system 1, and are typically triangles as shown in FIG. 4. If it is desired that the reference 

contained on a disk 3 or other weU known computer usable 40 mesh 21 contain only triangular elements, each rectangular 

medium. Referring to FIGS. 2 and 3, there are illustrated five cell is further divided into two triangular elements. Thus, the 

steps of the present invention which are first succincUy reference mesh 21 consists of patches 22 that are of the same 

ouUined and later described in detail. Briefly stated, these size except for the ones that are around the boundary of the 

five steps are as follows: (i) selection of a reference frame reference polygon 12. It is instructive to note that the nodes 

and a reference polygon 10; (ii) fitting a 2-dimensional mesh 45 23 are also corners of the patches 22. As may be obvious to 

inside the reference polygon 20; (iii) tracking the comers of those skilled in the art, the mesh 21 is completely described 

the polygon in the previous frame 30; (iv) mapping the by the collection of its patches 22. 

previous mesh onto the polygon in the current frame 40; and Referring to FIG. 2, once the reference mesh 21 is 

(v) local motion estimation via a hexagonal search and determined, the frame number n is set to r+1, step 25a, and 

corner refinement 50. 50 if n^T 26a, frame f„ is read in step 27a. Once the reference 

A Selection Of The Reference Frame And Polygon (Step mesh 21 is tracked in frame ^ using the Steps 30a, 40a, and 

20) 50a, the frame number n is incremented by 1 in 28a and the 

Referring to FIGS. 2 and 3(a), in the first step 10, the user incremented value is compared with T in step 26a. The Steps 

selects an object (i.e., the reference object) 11 within any 26a, 27a, 30a, 40a, 50a, and 28a, are repeated until n>T 

frame ( i.e., the reference frame) 14 that is to be tracked for 55 When n>T, the frame number n is set to r-1 in 25fc and 

eventual replacement with another object, which is herein- compared with 1 in step 26^7. If n^l, frame f„ is read in step 

after refen-ed to as the replacement object 17. A convex 27i?. Then, the reference mesh 2 lis tracked in frame f„ using 

reference polygon 12 is placed over the reference object 11 the Steps 30fe, 40b, and 50b, and the frame number n is 

so that the boundary of the reference polygon 12 coincides decreased by 1 in step 2Sb. Steps 26b, 27fc, 30b, 40b, and 

with the boundary of the reference object 11. The user 60 50b, 28b are repeated until n<l. 

creates the reference polygon 12 by selecting comers 13 of Referring to FIGS. 2 and 3(b), hereinafter, f„ is called the 

the reference polygon 12 for defining its boundary. It is current frame 114, P„ is called the current polygon 112, and 

advantageous to model the boundary of the reference object M„ is called the current mesh 121. The previous frame 214 

11 as a polygon for two reasons. First, the polygon can be of refers to the frame f„_l if n>r, or to the frame f^j if n<r. It 

any size, and secondly, by tracking only corners, it can be 65 is instructive to note that for both n=r+l and n=r-l, the 

determined how the boundary of the reference object 11 has previous frame 214 is in fact the reference frame 14. 

moved from one frame to another frame. Furthermore, the tracked reference polygon 12 in the pre- 
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vious frame 214 is called the previous polygon 212, and the C2. Defining The Cost Polygon For A Corner 

tracked reference mesh 21 in the previous frame 214 is In order to track a comer 213 of the previous polygon 212 

called the previous mesh 221. into the current frame 114, the user is required to specify a 

C. Tracking Corners Of The Polygon (Step 30) region 31 of pixels around each corner 13 in the reference 

The corners 213 of the previous polygon 212 are inde- 5 frame 14 that permits the motion model selected for that 

pendently tracked into the current frame f„, 114, as shown in corner 13. This region is specified in the form of a polygon, 

FlG.5,tofindaDinitialestimatcofthecurrcntpolygonll2. ^nd hereinafter defined as the cost polygon 3L The cost 

This initial estimate is then refined using the comer refine- Polygon 31 can be defined as a rectangular block centered 

1 . J 1 , ■ 4u-^ around the comer 13 as shown m FIG. 6(a), or it can be 

ment process as explained later m th^s section. ^^^^^^ ^ ^ ^ ^^^^^^ -^^ 

RefernngtoHOS 2 3,5,6,7,and 8 thecornertracking lO ^^^^^^ while completely remaining inside the reference 

method mcludcs the foUowmg steps: (1) selecting a motion polygon 12. In the latter case, one possible choice for the 

model for the comers 13, (2) assigning a cost polygon 31 to ^^^^ polygon 31 is a scaled-down version of the reference 

each corner 13 (3) finding the best motion parameters for polygon 12 placed at the corner 13 as shown in FIG. 6(t). 

each cost polygon 231 in the previous frame 214 using a ^ instructive to note that the size of the cost polygon 31 

logarithmic search method, and mapping the corners 213 of 15 should be as large as possible provided that the pixels within 

the previous frame 214 into the current frame f„ 114 with the the cost polygon 31 permit the same motion parameters, 

best motion parameters found for their respective cost In the following, K denotes the number of search regions 

polygons 231, In the following, we give a detailed descrip- specified by the user. As indicated earlier, the number of 

tion of these steps. search regions are determined by the complexity of the 

CI. Selecting A Motion Model For A Comer (Step 30) 20 motion model. Let C denote the cost polygon 31, and let L 

Depending on the local motion around the corners 13, one denote the total number of its comers 33, e.g., L=4 in FIG. 

selects for each corner 13 one of the following models: (i) 6(a), and L=5 in FIG. 6(6). Each search region is assigned 

translation, (ii) translation, rotation, and zoom, (iii) afiBne, to a distinct comer 33 of the cost polygon 31. Thus, it is 

(iv) perspective, and (v) bilinear. The translational model is required that K^L, Referring to FIG. 7, the corners 33 that 

the simplest one, and should be preferably selected if this 25 are assigned a search region are called moving corncrs(MC) 

model is applicable as one well skilled in the art can 32. Obviously, if K«L, then all comers 33 will be moving 

determine. If the translational model is selected, the user will corners 32. The moving corners 32 are numbered from 1 to 

be required to specify a rectangular search region to indicate K in the order of increasing motion complexity (i.e., MC^, 

the range of translational displacement for the corner 13. introduces translational motion; MC2 introduces rotation 

If the local motion around a comer 13 involves either 30 and zoom if K^2; MC3 mtroduces shear and directional 

rotation or zoom or both in combination, which is easily scaling if K ^3; and MC4 introduces perspective or biUnear 

determined by one well skilled in the art, we employ the deformation if K=4 ). One possible definition for the moving 

second model, i.e., translation, rotation, and zoom. In this corners 32 is given by 
case, in addition to a rectangular search region for the 

translational part of the motion, the user will be required to 35 SQ = q,.^,^ j / =1,2, ... , k. (i) 
specify a second rectangular search region to indicate the 
range of rotation and zoom. 

On the other hand, if there is shear and/or directional where [xj denotes the largest integer not greater than x, and 
scaling of pixels around a comer 13, we employ the third C, stands for the comer of C, e.g., as shown in FIG. 7, for 
motion model, namely the affine motion model. In order to 40 L-5 and K-3, we have MCj-C^ , MC^-Cj, and MCg-Cj . 
find the affine motion parameters, the user will need to C3. Finding The Best Motion Parameters For A Cost Poly- 
specify three rectangular search regions: one for the trans- gon 

lational part, one for the rotation and zoom part, and one for Referring to FIGS. 5, 6, 7 and 8, the method for tracking 

the shear and directional scaling. a corner 213 of the previous polygon 212 into the current 

Finally, if perspective or nonlinear deformations are 45 frame 114 is as follows. The following is repeated for each 

observed in the neighborhood of the corner 13, we employ corner 213 of the previous polygon 212. 

the fourth or the fifth motion model, respectively. For both Let R,-, i=l, . . . , K, denote the locations of the moving 

the fourth and the fifth models, the user will specify four corners O^C^, i=l. . . ,K) 32 of the cost polygon 31 in the 

rectangular search regions-three of them will determine the reference frame 14, and let P,-, i«l, . . . ,K, respectively 

extent of the affine motion, and the remaining one will 50 denote the initial locations of the moving comers 132 of the 

determine the amount of perspective or bilinear deforma- cost polygon 131 in the current frame 114. The initial 

tion. As will be explained later, as the complexity of the locations of the moving corners 132 in the current frame are 

motion model increases, i.e., as the number of search regions obtained from the locations of the moving comers 232 of the 

increases, so does the computational requirements for find- cost polygon 231 in the previous frame 214. 

ing the best parameters for the model. Therefore, in order to 55 Let D* denote the best mapping for the comer 213, i.e., 

reduce the computational requirements, it is preferred to use D* is the mapping that gives the best location of the comer 

during comer tracking, step 30, as simple a motion model as 213 in the current frame 114. In the following, a method is 

allowed by the characteristics of the motion around the given to determine D*. Let I,, denote the intensity disiribu- 

corner 13. tion in the reference frame 14, and let 1^ denote the intensity 

The size of each search region is defined in integer powers so distribution in the cunent frame 114. Let h^-, v^, i=l, . . . ,K, 

of 2. For each search region, the user specifies two integers, denote the integers that respectively determine the size of 

one to indicate the horizontal dimension and one to indicate the search regions 34 for MCj, i-1, . . . , K. The user also 
the vertical dimension of the search region. Thus, if ^ and " specifies the accuracy of the search as an integer power of 
denote the integers specified by the user to respectively 2. Let a denote the accuracy of the search, 
indicate the horizontal and vertical dimensions of the search 65 We are now ready to step-by-step describe the logarithmic 
region, then the size of the search region is given by search method used for comer tracking. A demonstration of 

2/t+2^2^2. the following is given m FIG. 8. 



07/11/2003, EAST Version: 1.04.0000 



5,982,909 



1. MCj is moved to 9 different locations in the current 
frame 114 that are given by 



mi 



m], Hi = -1, 0, I. 



(2) 



6.Welettj^*t=(mi*,nj 



, mj^*, Hj^*) denote the index 



values that correspond to D*. We decrement the values 
of hj, Vj, , . . , hj^, Vj^, by 1. If h^, Vj, <a for all 1=1, . . . , 
K, we have found the best model parameters with the 
desired accuracy and thus we stop. Otherwise we let 



2. If K=l, we find the translational mappings 

iJiim^, :n ^■Ji.mj.nj. mi,/i, = -1,0, 1 



(3) 



and compute the matching error value for each mapping, 

The best translational mapping is the one for which the 
matching error value is the smallest, i.e., 

}. . (5) 

We then move to Step 6. If K^2, however, we let 

li^i tjt=(mi,ni, . . . , m^nj, k»l, . . . K, (6) 

for notational simplicity, and continue. 
3. We let K=2 and find the following 9 translational 
mappings 



(7) 



4. We move UCf, to the following 9*^ locations in the 
current frame 114. 



= Mk-i.ji^^^Pk,mi,ni, ... ,mt_i,/ii-i =1,0,1 



(8) 



For each qjt.^^ we compute the following 9 different loca- 
tions 



mt, Hi = -1, 0, 1. 



(9) 45 



10 

and go to Step 1 to implement the next level of the 
logarithmic search. 

Once the best model parameters D* are found, the comer 
13 of the reference polygon 12 is mapped to the current 
15 frame 114 with D*, and the above procedure is repeated for 
remaining comers 13 of the reference polygon 12. 

A method for finding D* is given in V. Seferidis and M. 
Ghanbari, "General Approach to Block-Matching Motion 
Estimation," Optical Engineerings volume 32 (7), pp. 
20 1464-1474, July 1993. The presently disclosed method is an 
improvement over "General Approach to Block Matching 
and Motion Detection," because (1) the effect of each 
component of the motion model on the transformations of 
the cost polygon 31 is controlled by the search region 
25 associated with that component of the motion model, and (2) 
non-convex polygons are less likely to occur during the 
logarithmic search process due to the cumulative nature of 
the movements of the comers 32 of the cost polygon 31, 
Furthermore, the error expressions (Equations 4 and 12) 
30 used in the presently disclosed method can be modified 
according to C.-S, Fuh and P. Maragos, "AfBne models for 
image matching and motion detection," in IEEE Interna- 
tional Conference Acoustic Speech and Signal Processing, 
pp. 2409-2412, May 1991, Toronto, Canada, so that illumi- 
35 nation changes in the scene are also incorporated during 
corner tracking, step 30. 
D. Mapping The Previous Mesh (Step 40) 

Refening to FIG. 9, once an initial estimate of the current 
polygon 112 is determined in step 30, the next step 40 is to 
40 map the previous mesh 221 into the current polygon 112 to 
obtain an initial estimate for the current mesh 121. An initial 
estimate of the current mesh 121 is obtained by mapping the 
nodes 223 of the previous mesh 221 into the current polygon 
112 using a set of afifinc transformations as follows. 

Let M^ and M^ respectively denote the previous mesh 221 
and the current mesh 121, with c and p respectively denoting 
the numbers of the current 114 and previous 214 frames, 
such that 



If k<K we find the mappings 

1,0,1- (10) 

increment k by 1, and repeal Step 4; otherwise, i.e., if k=K, 
we continue. 

5. We find the following 9^ mappings 



\ <c t r <T Eind p 



[c-l,if c<r 
\ c +- 1 , if or 



(14) 



where r denotes the reference frame number. Let and P^ 
55 respectively denote the previous polygon 212 and the cur- 
rent polygon 112. In order to compute the aforementioned 
if-u • ■ . . r^^i^iu^^ ■■•> • • " w^n^— lAtli) affine transformations, first divide P^ and P^ into (L-2) 

' * ^ triangles 45. Then the triangular division can be formulated 

and compute for each one of them the matching error values ^ follows: 



=Yj\frW- fciOty^xf,mi,rn mt,nt = -1,0, ] 



(12) 



60 



Ri^ = iPi, Pi,,, Pi,,-,), for A = 1 L- 2, <■ = p, c. 



(15) 



The best mapping is the one for which the matching error 
value is the smallest, i.e., 



where R^^ is the ^th triangle on ^i Divide P^ and P^ as in (15) 
65 and find the aflSne transformations Aj^: ^p^-^^c^ k=l? • • . 
L-2. All nodes g„ 223 of M^ for n«l, . . . ,N, where N is the 
(13) total number of nodes 223, are visited sequentially and 
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mapped into the current polygon 112 as A„(g„) if g„e R^^. 
Mapping the comers 223 of a patch 222 in the previous 
polygon 212 to the corners 123 of a patch in the current 
polygon 112 is shown in FIG. 9. Based on the properties of 
affine transformation as explained in G. Wolberg, "Digital 
image warping, 'IEEE Computer Society Press, 1992, Los 
Alamitos, Calif., which is herein incorporated by reference, 
if a node 23 is on the boundary of two triangles 45, it is 
mapped to the same location by the affine transformations 
obtained for both triangles. The current mesh 121 
constructed from the previous mesh Mp 221 using the affine 
transformations as defined above is called the initial current 
mesh 121 in the current frame 114, and is refined by using 
the method given in the following section. 
E. Hexagonal Search and Comer Refinement (Step 50) 



the sides of the patches 122 in the current frame 114 always 
remains cxintinuous as the nodes 123 are moved. In this step, 
the comers 113 of the current polygon 112 are also refined, 
as will be described in detail below. 
S Referring to FIG. 4, three different types of nodes 23 are 
identified on the reference mesh 21, they are as follows; 
nodes 51 that arc inside the polygon 12 (inside nodes), nodes 

52 that are on the boundary of the polygon 12 (boundary 
nodes), and nodes 53 that are at the comers of the polygons 

10 12 (comer nodes). Once the initial current mesh 121 is 
obtained, the positions of inside 51, boundary 52, and comer 

53 nodes on the current mesh 121 are refined so that the 
difference in the intensity distribution between the current 
polygon 112 and its prediction from the reference polygon 



Referring to HGS. 10 through 18, an efficient search ^5 12 is minimized. In order to refine the positions of the inside 



Strategy is employed to refine the initial current mesh 121 on 
the current frame 114. This allows for handling image 
regions containing locally varying motion, i.e., image 
regions corresponding to scene objects with curved surfaces 
or surfaces that undergo mild deformations (i.e, deforma- 
tions that do not cause self occlusion of parts of the object). 
The invention also discloses a method to account for pos- 
sible changes in the illumination in the scene. The detailed 
description of this method is furnished later in this Section 



nodes 51, the hexagonal search approach is used as disclosed 
in Y Nakaya and H. Harashima, "Motion compensation 
based on spatial transformations," IEEE Transactions Cir- 
cuits and System Video Technology ^ vol. 4, pp. 339-357, 
20 June 1994. It is an iterative displacement estimation method 
that evaluates candidate spatial transformations. Using hex- 
agonal search, Y. Nakaya and H. Harashima refine the 
positions of only the inside nodes 51. The positions of 

block matching 



^ _ boundary nodes 52 are refined using a 

u\Va^dM'resp^^^^^^ ^ approach in J. Niewglowski, T Campbell, and R Haavisto, 

and patches 22 in the reference mesh 21. Also, let g, and r, "A Novel Video Coding Scheme Based on Temporal Pre- 

respectively denote the 'th node 123 and the >th patch 122 in dicUon Usmg Digital Image Warping IEEE Transactions 

the current mesh 121, where i=l, . . . , N and j=l, . . . , M. Consumer Electronics, vol. 39 pp. 141-150 August 1993. 

Each patch 122 in the current mesh 121 is aUowed to go Tho present invention refines the positions of the bouiidary 

through spatial warpings that are either affine or bilinear by 30 nodes 52 using a variation ^f ^he hexagonal se^^^^^^ 
moving the nodes 123 of the mesh 121. Affine and bilinear 



warpings are discussed in detail in G. Wolberg, "Digital 
image warping," IEEE Computer Society Press, 1992, Los 
Alamitos, CaUf. Affine mapping assumes three point corre- 
spondences and has six parameters: 



^13 
«23 J 



(16) 



where (x,y) and (u,v) denote the coordinates of a point 
before and after the affine mapping is applied, respectively. 
An affine map maps a rectangular block into an arbitrary 
parallelogram, giving shear, scaling, rotation and translation 



Since boundary nodes 52 that are on the same line must 
remain on the same line, their motion must be restricted to 
a line space during the hexagonal search. Thus, for a 
boundary node 52 the search space is one-dimensional while 
35 it is two-dimensional for an inside node 51. 

El. Refining the Locations of the Inside Nodes 51 

The position of each inside node 51 in the initial current 
mesh 121 is refined in an arbitrary order. Referring to FIG. 
10, let G be the current inside node 51 whose position is to 
40 be refined and let Sj, S^, . . . , Sj^ denote the patches 122 
surrounding G 51 on the current mesh 121, where K is 
the number of patches 122 for G. Let the corresponding 
patches 22 on M^ 21 be denoted by S^^, S,^ , . ., S^j^. The 
first step of the hexagonal search for node G 51 is to find the 



and has eight parameters: 



to it. Bilinear mapping assumes four point correspondences 45 region in the current frame 114 that will be affected from the 

movement of G 51. This region is called the cost polygon 54 
and denoted by S, where 

K 

50 S = [JS^. 



Note that affine mapping is obtained from bilinear mapping 
by setting a3=0 and b3=0 in (). For even further details, the 55 
book "Digital Image Warping" can be referenced. 

Our method uses the affine mapping when the reference 
mesh 21 includes only triangular patches 22. When the 
reference mesh 21 includes only rectangular patches 22, our 
method uses only the bilinear transformation. It is also 60 
possible that the reference mesh 21 contains both triangular 
and rectangular patches 22, in which case our method 
employs the affine transformation for the triangular patches 
22, and the bilinear transformation for the rectangular 
patches 22. 65 

Due to the ratio preserving properties of the bilinear and 
affine transformations, the image intensity distribution along 



The cost polygon 54 for node G 51 in the cunent frame 114 
can be generated very easily using the following steps: 

1. Set i=0 and create an empty point list, 

2. Let i<-i+l, 

3. Construct patch S-, let z=size of S„ 

4. Find the corner index, j, of G on patch S,-, 

5. From k-j+1 to k-j+z-1 append the (k mod z)'th comer 
of S,- to point list if it is not already in the list. 

6. If i<K go to step 2. 

7. The points in the list will be clockwise ordered. 

If the reference polygon 12 is rectangular, and triangular 
patches 22 are used, then all cost polygons 54 turn out to be 
hexagons, as shown in FIG. 10, for inside nodes 51, hence 
the name "hexagonal search". During the search, node G 51 
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in FIG. 10(a) is moved to a new location as shown in FIG. gon 55 gives the search space 58. Examples of search 

10(b) in a search space, updating the deformations of the polygon,searchwindow, and search space are shown in FIG. 

triangular patches 122 inside the cost polygon 54. The 12 when the patches 122 around G 51 are triangular (a) and 

updated patches 122 in FIG. 10(6) are called S'/s, (S'^*-S,-). when the patches 122 around G 51 are quadrilateral (b). In 

The predicted image inside the cost polygon 54 is syn- 5 the following, let A denote the search space 58. 

thesized by warping undcformed patches 22, "^ri's, on the The optimum location for G 51 is found using a logarith- 

reference frame 14, onto the deformed patches 122, S'/s, on mic method which reduces the computational load, espe- 

the cuaent frame. The mean absolute difference (MAD) or cially when subpixel accuracy is applied. The block diagram 

mean square dififerencc(MSE), formulated in equation 18, is for logarithmic search is shown in FIG. 13. 

calculated inside the cost polygon 54. lo Let d denote step size at each step of logarithmic search. 

In the first step 60 of logarithmic search, d is set to an initial 

^ K (18) step size, specified by the user as a power of two, and the 

Eix) = ^ Yj I ^'^ ~ 'riTkiii. mr level number k is set to 0, step 61. Let g^, denote the image 

Z Nk^'^ t' -"^^t coordinates of node G 51 at level k of the logarithmic search, 

15 where gj is set to the location of G 51, step 62, 

In Equation 18, (i,j) denotes a pixel location on patch S'^ ^ - i + lo Cifi^fLffL^!!] 

and MSE or MAD values are calculated by setting m to 1 or ~ accuracy J' 

2, respectively. In the same equation T,- denotes the back- 

ward spatial transformation T<:S„-SV and N, denotes the 20 ^^^^^ ^ ^^^^ j^^^^^^^, 

tota number of pucels inside S , He position of G 51 that ^ ^^^^ ^j,^ ^^^^^ 

minimiz^ the MAD or MSE value is registered as the new ^J^^^ by sampling the search space. A, with d. step 66a: 

location for G 51. 

Usually, pixels fall onto non-integer locations after they ^ 

are mapped using spatial transformations. Intensities for 25 ;tij = fft+(J L 
pixels that are on non-integer locations are obtained iising 
bUinear interpolation. In bilinear interpolation, intensity 
values of the four neighboring pixel locations are employed 

as explained in the following. Assume that a pixel located at where ij are integers and x,;;e a, (20) 

(id) on f is mapped to (x,y) on If a and b «e the largest 30 ^^^^^^ ^^^^ 

mtegem that respect^ely are not gi^ater than ^.^fj-^^J^ ,hown in HG. 14, the image intensities inside the cost 

|=x-a and 7f=y-b, then bilinear mterpolaUon is formulated *t. • j a *u , a'^*;..^ ^r^r... p/^ 

r . polygon are synthesized, and the prediction errors, h(x^^-j s, 

as toUows: computed. The sample location 70 giving the lowest 

noi i< prediction error is kept as the new location for G, i.e. let 

Ui, /) = (1 -^)C1 - v)lAa, t)^ii- ^Wa, b.l). (19) ^ ^^^p ^^^^ ^^^^ ^ ^^^^^ ^^^p 

f;;^(a + l, fc+-l) + f(l-7?)/,(o+l,A) Set 

6 

The present invention discloses a method to speed up the ^ *~ 2 ' 
original hexagonal search, and referring to FIG. 12, involves 

finding a search polygon 55 and a search space 57 as , , , .i i r.i. 

explained below The search polygon for node G 51 in the step 69, and k-k+1., step 64. n the subsequent levels of the 

current frame 114 is found using the following steps: logarithmic search, i e for k>l the search space 58 is 

, ^ . ^ ^ , r • * 1- * limited to a 3x3 neighborhood 71 of at each level. This 

1. Set 1-0 and create an empty point hst, 45 3^3 ^ % ggi), 

2. Let i^i+1, 

3. Construct patch S,-, let z=size of S„ r / 1 

4. Find the corner index, j, of G on patch S^, ^'J " ^* ^[j \ 

5. If z-4, patch is rectangular, append (j+1 mod z)th and 
(j+3 mod z)th comers of S,- to point list if they are not 

already in the list. ^^^^ U-i.o.i. P^vidcd x^e A. (21) 

6. Else if z=3, patch is triangular, append (j+1 mod z)th 

and 0+2 mod z)th comers of to point list if they are We let g^,i-x*, step 68, such that E(x.^ is minimum for x*, 
not already in the list. 55 step 67, and set 

7. If i<K go to step 2. ^ 

8. Order the points in the list in clockwise order. *^ ^ 2 ' 

9. Let F denote the polygon formed by the points found 
in Step 8. If F is convex, then F is the search polygon 

55. If F is not convex, then find the largest convex 60 step 69, and k-^-k+l, step 64 Logarithmic search is stopped 

polygon in F such that G 51 can move within F without once the desired accuracy is reached, i.e. when 5=accuracy, 

causing any overlaps as shown in FIG. 11. We note that step 63. A simple realization of logarithmic search is shown 

this operation is different from finding the convex hull in FIG. 14. In the exhaustive search strategy introduced in 

of a polygon (which is well known in the art). Nakaya, et al., initial sampling rate, S is set to accuracy and 
The search space 58 is obtained by introducing a square 65 only the first stage of our logarithmic search method, i.e., 

window 57 around G, whose size is specified by the user. k=l, is applied. Assuming an NxN search window 57 and a 

Intersection of the square window 57 with the search poly- desired accuracy value of a, there are up to 
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candidates 70 for G 51 in an exhaustive search, compared to 

where [xj denotes the largest integer not greater than x, and 
s denotes the initial sampling rate, in the presently disclosed 
logarithmic search method. Thus, for example, for N=9, s=2, 
a^Vs, the presently disclosed logarithmic search approach is 
nearly 83 times faster than Nakaya, et al. 

For a boundary node 52, the search space 58 is limited on 
a line as shown in FIG. 15. The cost polygon 54 for a 
botindary node 52 is formed in the same way as for an inside 
node 51, i.e., the cost polygon 54 for a boundary node 52 is 
the outline of the union of the patches 122 that have the 
boundary node 52 as one of their comers. The search 
polygon for a boundary node 52 is defined to be the line 
segment whose end points are the nodes 77, 78 that are 
neighbors to the boundary node 52 (G) on each side. The 
movement of G 52 is also limited by a rectangular search 
window 57 centered around G 52, whose size is specified by 
the user. The intersection of the search polygon 55 and the 
search window 57 result in the search space 58, which is 
denoted as B. A similar logarithmic method is then applied 
to boundary node G 52 in terms of a distance measure as 
explained below. 

Let d denote step size at each step of logarithmic search. 
Id the first step 80 of logarithmic search d is set to an initial 
step size, specified by the user as a power of two, and the 
level number k is set to 0, step 81. Let denote the image 
coordinates of grid G at level k of the logarithmic search, 
where gi=G, step 82, and 

(initial step size'^ 
* 
accuracy J 

and the value of accuracy is specified by the user. Increment 
k by 1, step 84, and if k=l, step 85, obtain the following 
locations 74 by sampling the search space B 58 with d, step 
86a; 

X/og^.+i6u, i is an integer such that S B. (22) 

Let g;t^i«x*, step 88, such that E(x,y) is minimum for x*, step 
87 Set 

6 

step 89, and k<-k+l step 84 In the subsequent levels of the 
logarithmic search, i.e., for k>l, the search space is limited 
to 3 locations 75 in the neighborhood of gj^ These locations 
75 are calculated as, step 86Z>, 

x/i'g^+itu, /—I J 0,1 provided Xj ^ g ^ (23) 

The logarithmic search is stopped once the desired accu- 
racy is reached. The flow diagram of the logarithmic search 
method is shown in FIG. 16. 

The hexagonal search process is iteratively applied to the 
nodes 123 of the mesh 121 as explained in Nakaya, et al. 
Due to the nature of the hexagonal search mechanism, the 



12,909 

14 

number of nodes 123 whose positions are refined during one 
iteration will decrease with the increasing number of itera- 
tions. During an iteration, the nodes of the mesh 121 can be 
visited in a fixed or random order. Iterations are stopped 

5 when there is no node whose position needs to be refined or 
when a maximum number of iterations has been reached. 
Due to repeated warpings of patches 122, this iterative 
method is computationally expensive, however, it is possible 
to process up to one third of the nodes in parallel as 

10 suggested in Nakaya, et al. 

The present invention also refines the location of each 
corner node 53. The refinement is performed at each itera- 
tion of the hexagonal search. The comer nodes are refined 
after or before all boundary 52 and inside nodes 51 are 

15 refined at each iteration. This step is introduced to refine the 
corner locations obtained as a result of corner tracking. 

Let c^ denote the % corner of the current polygon 112 
which is also a comer node 53 of the current mesh 121. The 
problem in moving a corner node 53 during the refinement 

20 process is in defining a cost polygon 54 for a comer node 53 . 
The cost polygon 54 has to be inside the current polygon 
112, close to the comer node 53, and interact with the mesh 
stmcture. This invention introduces two methods which are 
called as "local method" and "global method". Both meth- 

25 ods are based on constructing a point and a patch list as 
explained below. In local method, the point list is initialized 
with the corner node 53 (c,) and the nodes in the current 
mesh 121 that are connected to c,. Then, the nodes in the 
current mesh 121 that are connected to at least two of the 

30 nodes in the initial list are also added to the point list (this 
is because a quadrilateral can be diagonally divided into two 
triangles in two different ways, and the nodes that are 
connected to a comer node can be different for each case). 
In global method, the point list is constmcted in a different 

35 way. Referring to FIG. 17, in this case, a triangle 90, denoted 
as H, is formed by joining the previous, current and next 
corners of the current polygon in clockwise order, i.e. 
H=c,_iC,c,,.i, we call this triangle as "reference corner tri- 
angle". All the nodes of the mesh 121 that lie on or inside 

40 this triangle 90 form the point list. Once the point list is 
constructed using either one of the methods discussed 
above, a list of patches in the mesh 121 that will be affected 
from the movements of all nodes in the point list is con- 
structed. The patch list is formed by the patches in the mesh 

45 121 that have as a comer at least one of the nodes in the point 
list. 

A logarithmic search strategy similar to ones discussed 
during hexagonal search method has been applied for find- 
ing the best location for the comer node 53. Search space 58 
50 of the comer node 53 is defined by a square window around 

and denoted as 
D. The definition of the search space 58 remains the same for 
each iteration of hexagonal search. 

Refenring to FIG. 18, at the first step of logarithmic search 
for the corner refinement, step size, 6 is set to the half of the 
range, step 91, and the level number k is set to 0. Let S^^. 
denote the ^'th patch in the patch list, where k«l,2, . . . , K 
and K denotes the total number of patches in the patch Ust. 
Also let c,-^ denote the coordinate of the corner c,- at *th step 
of logarithmic search, where c^^c^, step 93 and 

range 
accuracy 

65 

Increment k by 1, step 95, and obtain the following 9 
locations by sampling the search space, with d, step 96: 



07/11/2003, EAST Version: 1.04.0000 



5,982,909 



15 



xtj = Ci, + d . . 1. ; = - 1, 0. 1, proviaed Mj e D. 



16 



(24) 



where 



(28) 



(29) 



When the corner, c^. moves to a new location, x, another 
triangle 100 which we call as "moved half triangle", and 
denoted by H', where H^c.-.^xxc^i is formed. Patches in the 
patch list are mapped using the affine transformation, A^^., 



A pictorial representation of interpolation for triangular and 
rectangular patches 22 is shown in FIG. 19. 

In C.-S. Fuh and P. Maragos, "AfBne models for image 
matching and motion detection," in IEEE International 



between H and H'. After affine mapping is applied to patches Conference Acoustic Speech and Signal Processing, pp. 
deformed patches denoted as S'j^^are obtained for k«l,2, . . . , 2409-2412, May 1991, Toronto, Canada a method is dis- 



K- Using the mappings between undeformed patches on the 
reference meshes, S^j^, and deformed patch S';^, intensity 
distribution on all patches is predicted. Using (18), E(x) is 
calculated, which is as an error criterion for corner move- 
ment. The location x* where E(x,y) is minimum, step 97, for 
i,j«-l,0,l, is kept as step 97. The step size is halved, step 
99, and the logarithmic search is continued until the step size 
is less than the accuracy, step 94. After *max has been 
reached logarithmic search is stopped. Reference half tri- 
angle and effect of corner movement can be seen in FIG. 17. 

Current corner's movement can also be performed 
exhaustively. Then all locations calculated as 



15 



closed to model the effects of illumination change on the 
image intensity distribution. The method disclosed in C.-S. 
Fuh and P. Maragos employ the following error criterion: 



(30) 



+ d| ' |, (, ; = -range, 0. range, provided Xij e D. 



35 



40 



are tested for smallest matching error (18). The location that 
minimizes the prediction error E(x) is chosen as the new 
location for c,- among, x^'s. 

In logarithmic method initial step for corner movement is 
set to half of the range and at each iteration this step size is 
reduced to its half. The flow diagram for logarithmic search 
for a comer is shown in FIG. 18. 

A comer node 53 is visited, i.e., tested for possible 
refinement in its position, if the current iteration is the first 
one or a neighboring node in the mesh 121 has moved before 
the comer node 53 is visited. Corners are visited by two 
different strategy, called as "bottom-up" and "top-down" 
approaches. In the bottom-up approach, comers nodes 53 are 
visited after all the boundary 52 and inside 51 nodes are 
visited in every iteration. In the top-down approach, comers 
nodes 53 are visited before the boundary 52 and inside 51 
nodes are visited in every iteration. 

When a corner node 53 is moved to a new location during 
logarithmic search, the boundary of the current polygon 112 
is changed and all the nodes of the mesh 121 that are inside 
the reference half triangle 90 are mapped with the affine 
mapping between the reference half triangle 90 and the new 
half triangle 100 formed when the corner node S3 has moved 
Incorporating Illumination Chanaes 

In the present invention, displacement vector d^ for a pixel 
location x in a patch 22 is calculated bilinearly from the 
displacement vectors of the comers of the patch 22. Refer- 
ring to FIG. 19, given a triangular patch ABC 22 and a pixel 
X inside the patch 22, the pixel location can be written as 



20 where r is referred to as the multiplicative illumination 
coefficient and c is referred to as the additive illumination 
coefficient. 

In order to minimize Equation 30, C.-S. Fuh and P. 
Maragos first find optimal r and c by setting the partial 
(25) 25 derivatives aMSE/ar and aMSE/ac to 0. This yields two 
linear equations in r and c which can be solved to find 
optimal r* and c* as a function of d^. Letting Ic(x)=I and 
I^(x+dJ J, optimal solutions are given as follows: 



(31) 



where all the summations are over x in patch 22. 

In C.-S, I^h and P. Maragos, the values of the illumina- 
tion coefficients r and c are assumed constant for all x in 
patch 22. However, in most image sequences, illumination 
also changes spatially within a frame. The present invention 
discloses a new method to overcome this problem. Rather 
then assigning a pair (r,c) of illumination coefficients to each 
45 patch 22 on the mesh 21, the presently disclosed method 
assigns a pair (r,c) of illumination coefficients to each node 
23 on the mesh 21. The presently disclosed invention allows 
illumination to continuously vary within a patch 22 by 
obtaining the illumination at location x from the illumination 
coefficients assigned to the corners of the patch 22 using 
bihnear interpolation which is given below for a triangular 
mesh 22 
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cAl-p-q)c,+pCi,+qc,, 



(32) 



(26) 



where a gives the position of point A. 

If d„,db,d^ denote the displacements of comers A,B,C, 
respectively, displacement of pixel x, d,, is calculated using. 



(27) 



Where, (r„ cj, (r^,, c^), and (r„ c,) are the illumination 
coefficients respectively assigned to corners A, B, and C, and 
(r^, cj corresponds to the illumination at point x in patch 22. 

In order to estimate the illumination coefficients (r, c) for 
each node, the present invention discloses two different 
methods. In the first method, r and c are assumed to be 
constant inside the cost polygon 54 of an inside or boundary 
node, and values for r and c found as a resuh of hexagonal 



If the patch 22 is rectangular, given the displacements of 65 search are assigned to the node. For a comer node 53, the 
each comer, displacement of a pixel x inside the rectangle presently disclosed invention assigns a weighted average of 
ABCD is calculated using r and c values calculated on the patches which have the 
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corner node as one of their corners. The weights are deter- 
mined by the area of each patch. 

In the second method which is called as "interpolation 
method", r and c values are allowed to continuously vary for 
each pixel location inside the cost polygon 54 during hex- 5 
agonal search. This method is used only for inside and 
boundary nodes. The bilinear interpolation method that has 
been mentioned above is applied for calculating these val- 
ues. Let G denote a node and let K denote the number of 
patches that arc in the cost polygon associated with G. Let lO 
S;t, k=l,2, . . . , K denote the patches that arc in the cost 
polygon associated with node G. Let Gj^ k=l, ... K, in 
clockwise (or counter-clockwise) order represent the nodes 
of the mesh that arc on the cost polygon associated with 
node G. Then, the error criterion used in the second method 15 
is given by: 

1 ^ , (33) 

where 
and 

where (r^^^ c^^ are the illumination coef&cients for node Gj^,. 30 
in the above expression (r<s^, c^^) are fixed and assumed to 
be known. During the first iteration of the search, it is 
possible that some Gj^'s are not visited prior to G. To 
overcome this problem, the present invention assigns initial 
values to illumination coefficients on every node in the 35 
mesh. The initial values for the multiplicative and additive 
illumination coefficients are either respectively set equal to 
1.0 and 0.0, to their previous values calculated on the 
previous image. 

Hierarchical Hexagonal Search 40 

The present invention also discloses a method, henceforth 
called "hierarchical hexagonal search method," to imple- 
ment the method of (E) in a hierarchy of spatial resolutions. 
Referring to the example given in FIG. 20, once the refer- 
ence mesh 21 is tracked into the current frame 114 and the 45 
current mesh 121 is obtained, new inside and boundary 
nodes, henceforth called high resolution nodes 141, are 
added to the mesh 121 half way on each hnk 140 that 
connect two nodes 123 in the mesh 121. Once the high- 
resolution nodes are added to the mesh 121, step 50 of the so 
present invention is repeated to further refine the locations of 
the low-resolution 123 and the locations of the high- 
resolution 141 nodes. Once step 50 is completed with 
high-resolution nodes in the mesh 121, still higher resolution 
nodes can be added to the mesh 121 and step 50 then 55 
repeated any number of times. At this point, it is important 
to note that, only the original low-resolution nodes 123 are 
mapped to a subsequent frame in step 40 to find the initial 
low-resolution mesh in the subsequent frame 

The advantages of the hierarchical hexagonal search 60 
methods are: (1) It is less sensitive to the initial patch size 
selected by the user as the size of the patches are varied 
during the hierarchical search process (2) it is computation- 
ally faster as large local motion can be tracked in less time 
with larger patches than with smaller patches, and smaller 65 
patches can track local motion in less time when their initial 
location is determined by the movement of large patches. 
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Synthetic Transfiguration 

One important application of the invention is in the area 
of synthetic object transfiguration where an object, such as 
the contents of a billboard, is replaced by a new object and 
rendered throughout the sequence in the same manner as the 
original object. The appUcation of the method to synthetic 
transfiguration is described below. 

Referring to FIG. 3(a), once the 2-D meshes Mj, . . . , Mj^ 
are found which represent the global and local motion of the 
reference object 11 to be replaced, first, the reference mesh 
21 is mapped onto 19 using a spatial transformation 
between Pr and the reference polygon P^, where the poly- 
gon Pj, 19 defines the boundary of the replacement object 17. 
For the transfiguration application, it is required that the 
spatial transformation between P^ and P^ can be represented 
by an affine transformation. Using the transformation 
between P^ 19 and P^ 12, the reference mesh 21 is 
mapped onto f^ 18 to obtain the mesh Mji 16 on the 
replacement object 17. Then, the following backward map- 
pings are computed 

ff..^:M„^"^M„^. m=l,2 N, «»3.2 ^ (34) 

where H„^ is the backward mapping between the '"th patch 
on the "th mesh, M„, and the '"th patch on the replacement 
mesh, Mr, N denotes the total number of patches in each 

mesh, and 1 denotes the number of frames in the given 
image sequence. 

If illumination changes are observed during the process of 
mesh tracking, they are also incorporated on the transfigured 
object 17 using 

r„{x)^rMH^J)^c, for all x E A/^„. (35) 

where I„ and 1^ respectively denote the image intensity 
distribution at the n-th and replacement frame. The multi- 
plicative illumination coefficient r^ and the additive illumi- 
nation coefficient c_^ are obtained by bilinearly interpolating 
the multiplicative and additive illumination coefficients 
found for the corners of the patch during the process 
of mesh tracking. The details of the bilinear interpolation 
used for computing r^ and c^ are disclosed above. 
We claim: 

1. A method for tracking a first predetermined, two- 
dimensional portion of an image throughout a sequence of 
knages, the method comprising the steps of: 

(a) selecting a reference frame; 

(b) selecting the predetermined, two-dimensional portion 
within the reference frame by choosing a reference 
polygon having at least three comers that defines the 
boundary of the first predetermined region; 

(c) fitting a reference mesh having comer nodes at the 
comers of the reference polygon and at least one inside 
node inside the reference polygon; 

(d) predicting the reference polygon in subsequent or 
previous image frames by independently tracking the 
comers of the reference polygon; 

(el) dividing the reference polygon and the tracked poly- 
gon into a minimum number of triangles so that each 
triar^le in the reference polygon respectively corre- 
sponds to a triangle in the tracked polygon; 
(e2) finding parameters of affine transformation between 

each corresponding pair of triangles; and 
(e3) mapping nodes in each triangle of the reference 
polygon into the respective triangle in the tracked 
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polygon using the parameteis of the corresponding 
affine transformation used for the triangle in which the 
node is located; 

(f) refining locations of the inside and corner nodes of the 
corresponding mesh for tracking local and global 5 
motion of the first predetermined portion, wherein the 
steps (c) to (f) are implemented in a hierarchy of spatial 
resolutions; 

(g) refining the location of boundary nodes on the refer- 
ence mesh for tracking the local motion around the 
boundary of the first predetermined portion; 

(h) tracking illumination changes that occurred between 
the reference frame and a previous or subsequent 
frame; and 

(i) replacing the first predetermined portion with a second 15 
predetermined portion throughout a portion of the 
sequence of images so that the second predetermined 
portion undergoes the same global and local motion as 
the first predetermined portion; wherein the corner, 
inside and boundary nodes divide the reference mesh 20 
into either triangular or rectangular patches or a com- 
bination of both triangular and rectangular patches; 
wherein step (d) includes: (dl) selecting a motion 
model for the comer nodes; (d2) assigning a cost 
polygon to each comer node; and (d3) estimating 
parameters of the motion model for each cost polygon 
and (d4) mapping the comer nodes with the estimated 
motion parameters. 

2. The method as in claim 1, wherein step (d3) further 
includes defining a maximum range for estimating the 
parameters of the motion model. 

3. The method as in claim 2, further comprising the step 
of refining the location of boundary nodes on the tracked 
polygon. 

4. The method as in claim 3 further comprising the step of 35 
fitting a mesh to the second predetermined portion that 
corresponds in nodes and patches to the mesh in the refer- 
ence polygon. 

5. The method as in claim 4 further comprising the steps 
of finding parameters of affine transformation between each 40 
corresponding pair of patches in the second predetermined 
portion and the tracked polygon in the previous and subse- 
quent image frames. 

6. The method as in claim 5 further comprising the step of 
mapping pixels in each patch in the second predetermined 45 
portion into the corresponding patch in the tracked polygon 
using the parameters of the corresponding afiEne transfor- 
mation. 

7. An article of manufacture comprising: 
a computer usable medium having computer readable 50 

program means embodied therein for causing tracking 
of a first predetermined, two-dimensional portion of an 
image throughout a sequence of images, the computer 
readable program code means in said article of manu- 
facture comprising: 

(a) computer readable program means for causing the 
computer to effect selecting a reference frame; 

(b) computer readable program means for causing the 
computer to effect selecting the first predetermined, 



(d) computer readable program means for causing the 
computer to effect predicting the reference polygon 
in subsequent or previous image frames by indepen- 
dently tracking the comer of the reference polygon; 

(e) computer readable program means for dividing the 
reference polygon and the tracked polygon into a 
minimum number of triangles so that each triangle in 
the reference polygon respectively corresponds to a 
triangle in the tracked polygon; for finding param- 
eters of afifine transformation between each corre- 
sponding pair of triangles; and for mapping nodes in 
each triangle of the reference polygon into the 
respective triangle in the tracked polygon using the 
parameters of the corresponding affine transforma- 
tion used for the triangle in which the node is 
located; 

(f) computer readable program means for causing the 
computer to effect refining locations of the inside and 
comer nodes of the corresponding mesh for tracking 
local and global motion of the first predetermined 
portion; 

(g) means for causing the computer to effect defining 
the location of botmdary nodes on the reference 
mesh for tracking the local motion around the bound- 
ary of the predetermined portion; 

(h) means for causing the computer to effect tracking 
illumination changes that occurred between the ref- 
erence frame and a previous or subsequent frame; 

(i) means for causing said (c), (d), (e) and (f) computer 
readable program means to be implemented in a 
hierarchy of spatial resolutions; 

(j) means for causing the computer to effect replacing 
the first predetermined portion with a second prede- 
termined portion throughout a portion of the 
sequence of images so that the second predetermined 
portion undergoes the same global and local motion 
as the first predetermined portion, wherein the 
comer, inside and boundary nodes divide the refer- 
ence mesh into either triangular or rectangular 
patches or a combination of both triangular and 
rectangular patches; and 
(k) means for selecting a motion model for the comer 
nodes; for assigning a cost polygon to each comer 
node; for estimating parameters of the motion model 
for each cost polygon; and for mapping the comer 
nodes with the estimated motion parameters. 

8. The article of manufacture as in claim 7 further 
comprising computer readable program means for defining 
a maximum range for estimating the parameters of the 
motion model. 

9. The article of manufacture as in claim 8 further 
comprising computer readable program means for refining 
the location of boundary nodes on the tracked polygon. 

10. The article of manufacture as in claim 9 further 
comprising computer readable program means for fitting a 

55 mesh to the second predetermined portion that corresponds 
in nodes and patches to the mesh in the reference polygon. 

11. The article of manufacmre as in claun 10 further 
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comprising computer readable program means for finding 
parameters of affine transformation between each corre- 
two-dimensional portion w7thin the reference frame 60 spending pair of patches in the second predetermined por- 
by choosing a reference polygon having at least three tion and the tracked polygon in the previous and subsequent 
comers that defines the boundary of the first prede- image frames. 

termined portion; 12. The article of manufacture as in claim 11 further 

(c) computer readable program means for causing the comprising computer readable program means for niapping 
computer to effect fitting a reference mesh having 65 pixels in each patch in the second predetermined portion into 
comer nodes at the comers of the reference polygon the corresponding patch in the tracked polygon using the 
and at least one inside the reference polygon; parameters of the corresponding affine transformation. 
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13. A method for tracking a first predetermined, two- 
dimensional portion of an image throughout a sequence of 
images, the method comprising the steps of: 

(a) selecting a reference frame; 

(b) selecting the predetermined, two-dimensional portion ^ 
within the reference frame by choosing a reference 
polygon having at least three comers that defines tbe 
boundary of the first predetermined region; 

(c) fitting a reference mesh having corner nodes at the 
corners of the reference polygon and at least one inside 
node inside the reference polygon; 

(d) predicting the reference polygon in subsequent or 
previous image frames by independently tracking the 
comers of the reference polygon; 15 

(e) predicting a corresponding mesh in the subsequent or 
previous image frames by mapping the reference mesh 
into the tracked polygon using a plurality of different 
af&ne transformations; 

(f) refining locations of the inside and corner nodes of the 20 
corresponding mesh for tracking local and global 
motion of the first predetermined portion; wherein the 
steps (c) to (f) are implemented in a hierarchy of spatial 
resolutions; 



(g) refining the location of boundary nodes on the refer- 
ence mesh for tracking the local motion around the 
boundary of the first predetermined portion; 

(h) tracking illumination changes that occurred between 
the reference frame and a previous or subsequent 
frame; 

(i) replacing the first predetermined portion with a second 
predetermined portion throughout a portion of the 
sequence of images so that the second predetermined 
portion undergoes the same global and local motion as 
the first predetermined portion; wherein the corner, 
inside and boundary nodes divide the reference mesh 
into either triangular or rectangular patches or a com- 
bination of both triangular and rectangular patches; and 
wherein step (d) includes: (dl) selecting a motion 
model for the corner nodes; (d2) assigning a cost 
polygon to each comer node; and (d3) estimating 
parameters of the motion model for each cost polygon 
and (d4) mapping the comer nodes with the estimated 
motion parameters. 
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